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Abstract 

We use a Bayesian approach to optimally solve problems in noisy binary search. We deal with two variants: 

• Each comparison can be erroneous with some probability 1 — p. 

• At each stage k comparisons can be performed in parallel and a noisy answer is returned 

We present a (classic) algorithm which optimally solves both variants together, up to an additive term of 0(log log(n)), and 
prove matching information theoretic lower bounds. We use the algorithm to improve the results of Farhi et al |FGGS99 1 
presenting a quantum (error free) search algorithm in an ordered list of expected complexity less than (log 2 n)/3. 

1 Introduction 

Noisy binary search has been studied extensively (see lKMRSW80l|PeT8^|AD91||DGW92||BK^|FRPU94||Asl95l|MuT%l 
Orr96 Ped99 Pel02|). The basic model begins with an array of n elements. We are given a special element s, and try 
to find its rank in the array. Every query consists of comparing s to one of the elements. One can add noise by making 
each comparison (or query) return the wrong result with probability 1 — p. One can also think of an adversarial model 
in which an adversary is allowed to choose whether the algorithm gets the right answer. Our work focusses on the noisy 
non-adversarial model. 

Practical uses for optimal noisy search can occur (for example) in biology. A simple application is eye tests, which can 
be considered as comparing our sight capability to fixed benchmarks (determined by the size of the letters we are trying to 
see). Other (more complex) possible applications are trying to determine the supermolecular organization of protein com- 
plexes and isolating active proteins in their native form | SCJ94 HEWJB04 1 . In both cases, the 3-dimensional conformation 
of the proteins should be conserved, and solubilization methods are based on different percentages of mild detergents. Fur- 
ther, the separation of the above molecules is based on different percentages of acrylamide and bisacrylamide. Determining 
the right percentage can be done by noisy binary search, running a gel for each query. 

An interesting theoretical use is another way to devise the results of IKK071 . They present a sophisticated algorithm 
to insert a coin with an unknown bias to a list of coins with increasing bias (which is also unknown). In order to use our 
algorithm, we need a way to compare coins (an oracle). By using the clever reduction of |KK07| we can always assume 
that one of them is unbiased. We can therefore flip both coins together, until we get different results in both coins. We then 
assume that the coin that got heads has higher bias towards heads, and consider this to be a noisy query (the exact noise is 
dependent on parameters of the problem which exist in | KK07 1). 

Generalizing binary search (without noise) when k questions can be asked in parallel and then answered together is 
trivial. The algorithm is simply to divide the array into k + 1 equal parts, and ask in which of the parts is the element we 
are looking for. This model, and its noisy variant, are important (for example) when one can send a few queries in a single 
data packet, or when one can ask the second query before getting an answer to the first. 

1.1 Previous Results 

It is known that one can search in 0(log(n)//(p)) queries assuming probabilistic noise. One way of doing it is iterating 
every query many times to obtain a constant error probability, and then to travel the search tree backtracking when needed 
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|FRPU94|. This leads to large constants, and has no easy generalization for the batch learning model. Aslam showed a 
reduction of probabilistic errors to an adversarial model (see |Asl95 Ped99 1), and stated as an open question if it possible 
to achieve a tight algorithm. Aslam's algorithm suffers from the same multiplicative factor that arises in the adversarial 
algorithm, and might not be applicable to generalizations of noisy search. 

Although it is known that quantum binary search has complexity 0(log(n)), determining the exact constant remains 
an open problem ( HFGGS99I IJLB05I ICLP061 IBBHT98I |Amb99l IHNS021 ICLP06I1 ). Farhi et al presented in HFGGS99H two 
quantum algorithms for searching an ordered list. They first presented a "greedy" algorithm with small error probability 
that clearly outperformed classical algorithms. However, they could not analyze its asymptotic complexity, and therefore 
did not use it. Instead, they devised another algorithm, which can find the correct element in a sorted list of length 52 in 
just 3 queries. Iterating this as a subroutine gives an 0.53 log 2 n quantum search algorithm. This was later improved by 
[CLP06 1 searching lists of 605 elements using 4 comparisons to get 0.433 log 2 n queries. We note that these algorithms are 
exact. Since Farhi et al's greedy algorithm has small error probability iterating it on a fixed size list results in a noisy binary 
search algorithm. However, without an exact analysis of noisy binary search, the resulting bounds are not strong enough. 



The main intuition of our work is simply to force the algorithm to ask queries where it has no information about the answer, 
thus causing it to be more exact. We do so by using a Bayesian learner which tries to learn the place of the element we are 
looking for. Note that in this case myopic behavior is optimal, although previous (non optimal) algorithms were a lot more 
complex. 

Assume that the element we are searching has equal probability to be any element in the list. Partition the list so that 
both parts have probability 1/2 to contain the right element, and ask in which part is our element by comparing it to the 
"middle" element (where middle is being given by the probability measure). Following the standard Bayesian approach 
update the probabilities of all elements given the outcome. Iterate this (partitioning the array to "equal" parts, measuring 
and updating probabilities) until there are just a few elements with relatively high probability to be the right element, and 
then compare directly to these elements. In each partition, we gain an expected I(p) bits of information. Formally 
Theorem 1.1. There exists a ( classic) algorithm which finds the right element in a sorted list ofn elements with probability 
1 — S using an expected 



noisy queries, where each query gets the right answer with probability p. This is tight up to log log terms. 

We present a similar Bayesian strategy when we are allowed to use a few queries in parallel (see |2.5l >. Once we have 
an exact noisy search algorithm, we can recursively use the noisy greedy quantum binary search of Farhi et al. Measuring 
after r queries in their algorithm corresponds to sampling the intervals according to a probability distribution which is 
concentrated near the correct interval. If the entropy of this distribution over the k equal probability intervals is H r , then 
the average information is = log(/c) — H r , and the expected number of queries is r ' lo s( n ) _ with this we can show 
Theorem 1.2. The expected quantum query complexity of searching an ordered list is less than 0.32 log(n). 

We use our algorithms to prove some new quantum lower bounds on noisy search, and on search which can have a 
probability of failure. 

Section 2 gives the classical algorithm, and proves the classical lower bounds. Section 3 presents a quantum algorithm 
for searching an ordered list. Section 4 improves the known lower bounds for quantum binary search when the algorithm 
is allowed to err (even with high probability). 

2 Classic Algorithm 
2.1 Problem Settings 

Let X\ > . . . > x n be n elements, and assume we have a value s such that x\ > s > x n , and we want to find i such that 
Xi > s > Xi + i. The only way to compare xi and s is by using the function f(i) — ► {0, 1} which returns 1 if x$ > s and 
if Xi < s. The problem is that when calculating / we have a probability of 1 — p for error. Note that calculating / twice 
at the same place may return different answers. As our approximation for / has a chance of error, we let our algorithm err 
with probability S. First, we present an algorithm which is highly inefficient with respect to S but almost optimal (up to 
loglog factors) with respect to n and p, and then explain how to improve it. 



1.2 Our Results 
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The algorithm we present is based on using Bayes's formula to update Pr(xi > x > £Ei+i) for every i. To do that, we 
need a prior for this distribution. To achieve a uniform initial distribution, we apply a trick due to Farhi et al in IFGGS99L 
which doubles the initial search space, but turns the algorithm into a translationally invariant one (thus making the prior 
uniform). The idea is to add another element Xi+ n for each Xi, such that all 2n elements are ordered in a circle. We then 
apply the algorithm with a random shift on the circle, and thus begin with a uniform prior. 

Formally, Farhi et al. solve a different problem which is equivalent to search. They define n functions fj (x) defined by 



for j e {1,.. .,?]}. A query in this problem is giving the oracle a value x, and getting fj(x) for some fixed but unknown 
j, and the goal of the algorithm is to find j. They then double the domain of the functions and define Fj (x) by 



And use the fact that Fj+±(x) — Fj(x — 1) to analyze their algorithm only for j = 1. To do a similar trick, define 
x n+ i . . . X2n by Xi+n = —Xi. Note that if the algorithm returns r when given f r (x) as an oracle (remember that the 
algorithm does not know that it queries f r ), it would return r — k (mod 2n) if a shift Xk would be applied to all its queries 
(that is whenever the algorithm wishes to query a value x it gets the value of f r (x — Xk) instead). 

Before the algorithm begins, we choose a random shift x\ > Xk > x n , and instead of calling f r (x) we use the oracle 
with f r (x — Xk). This means that for any initial j value such that Xj > s > Xj+\, the probability that the right answer 
for the modified algorithm is either i or i + n is 1/ n. This is true because the new probability distribution is a convolution 
between the old probability distribution (the value j) and the uniform one (choosing x^). We assume that this shift has been 
done and return to our former definitions (i.e. X\ > . . . > x n with the special element s uniformly distributed). 

Definitions The algorithm uses an array of n cells a\,..., a n , where a, denotes the probability that Xi > s > Xi+%. 
The initialization of the array is ai = 1/n, as we have a flat prior distribution. Every step, the algorithm chooses an index i 
according to the values of a%, . . . , a n , and queries f(i). After calling f(i) the algorithm updates the probabilities aj. This 
means that if f(i) returned (i.e. Xi < s with probability p), we multiply aj for j < i by p, multiply aj for j > i by 
1 — p and normalize so that the values oi, . . . , a n sum up to 1. The exact action we take depends on the sum q = J2j=i a j- 
Assuming again / returned zero, the normalization is 



Note that if \p — 1/2 j ^> \q — 1/2 1, as will be the case in our algorithm, the normalization is almost multiplying the 
probabilities by 2. For example, in the case f(i) = Q we almost have aj — > 2paj for j < i and aj — > 2(1 — p)aj for j > i. 

2.2 Algorithm 

The main idea of the algorithm is an intuitive generalization to binary search. In every stage partition the elements in 
the "middle" and ask whether the middle element is smaller or larger than s. The definition of "middle" depends on the 
probabilities of the elements - we want to query an element Xi such that Pr(xi > s) = 1/2. There are two technicalities 
we must address: 

1. It is not always possible to find an element such that Pr(x^ > s) — 1/2. Therefore, we use a constant called e par 
("par" stands for partition) which is an upper bound to | J2] =1 dj : — 1/2 1 = | q — 1/2] . Its value will be chosen such 
that we are optimal with respect to p. Enlarging this value will cause us to extract less information each query. 

2. It is hard to distinguish between elements which are very close to each other. Therefore, the algorithm does not 
necessarily finds the index of s, but rather an index i such that there are at most l sur elements between Xi and s (l SU r 
stands for surroundings). We can then iterate the algorithm, this time searching the elements Xi-i sur , . . . ,Xi+i sur . 
Making sure l sur is 0(log(n)) gives the right running time, even if the constant in the O notation is large (as this 
gives an additive (9(log log(n)) term to the runtime). 






We write explicitly the update for f(i) — 1 
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The exact values for e par and l sur will be chosen later. 



1. If there is an index i such that a« > e par we prove that X(j_; sur ) > s > £(i+; SUI .) with probability greater 
than 1 — 5/3. It is now possible to run recursively with Sf = 6/3 and search in only 2l sur + 1 elements. 

2. Else find an index i such that 1/2 — e pQr < J2j=i a j < 1/2 

3. Query f(i) and update the probabilities. Return toQ] 

Previous noisy search algorithms have already used weights, see for example [ KMRSW80 BK93 KK07 1 . However, we 
choose weights optimally, and use information even when p is very small (see for example the usage of e goo d in I1KK07I0 . 
This gives us better results, and enables optimal generalization to the batch model. 

Lemma 2.1. If the algorithm reached stage]2\it is possible to find i such that 1/2 — e par < 2^i=i a j < 1 /2- 

Proof. Assume such i does not exist. Let k be the maximal value for which Ylj=i a j < 1 /2- This means that Y^j=i a o > 
1/2 and 23j=i a i < 1/2 — e pa r, and therefore that a k+ i > e pa r, and we should have stopped in stepQ] □ 

We now need to prove two main claims - that we will end the algorithm in step[TJin a reasonable time, and that when we 
do so with high probability the value s will be in the surroundings of i. The first claim is stated as lemma l2~4l and is based 
on lemmas [2721 and |2~3l To address state these lemmas we need to use the entropy H(a\ , . . . , a n ) = X)"=i ~ ai ^°s( a i) an d 
the information I(ai, . . . a n ) = log(n) — H(ai, . . . a n ). 
Lemma 2.2. //Vi, aj < e par then H(a\, . . . , a n ) > log(l/e par ). 

Proof H(ai, . . . , a n ) = J2i=i ~ a i l°g(«i) > S"=i ~ a i log(e par ) = log(l/e por ) J2?=i a i = log(l/e por ) 

Where the first inequality comes from the monotonicity of the log function and Vi, a, < e par - d 



This means that if H(ai, . . . , a n ) < log(l/e par ) There exists i such that a, > e par 
Lemma 2.3. In every iteration of the algorithm, the expected rise of the information function I(di, . . . , a n ) is greater 
than I(p) - 4e^ ar (l - 2p) 2 which is at least J(p)(l - 3h) L n ) )for e par = y/l/24log(n). 



Proof. Let bi, . . . , b n be the new probability values (after we update oi, • • • , a„ according to the result of /). Assume that 
the partition was between k and k + 1. Let 5^ i=1 flj = Q, and N nor — pq+ ^ 1 _ L p ^ 1 _ q ^ be the normalization constant used 
by the algorithm in case f(k) returned zero. We look at the information for this case: 

k n 

I(bx, b n \f(k) = 0) = log(n) + N nor p ■ a t \og(N nor p ■ a t ) + N nor (l - p) ■ at log(7V„ or (l - p) ■ a t ) 

i=l i=k+l 

Where the c^'s are the values before the update and the bi's are the values after it. We analyze the first sum 



^ N nor p ■ a,i \og(N nor p ■ ai) = N nor p\og(N nor p) ^ a,; + N nor p^ a i lo g(«i) = 

z—1 i—1 i—1 

N nor pq\og(N nor p) - N nor pH(a 1 , ...,a k ) 



I(bi, b n \f{k) = 0) = log(n) + N nor pq\og(N nor p) - N nor pH{ai, ...,a k ) + 

N nor (l-p)(l - q) \og(N nor (l -p)) - N nor (\ - p)H(a k +i, ■ ■ ■ ,a n ) 

To analyze the expected information gain, we look at the probability for f(k) = 0. Luckily, it is pq + (1 — p)(l — q), 
which is l/N nor . Calculating the information for f(k) = 1 would give similar results, but the normalization factor would 
change to M nor = p( ^ 1 _ q ^]_^ 1 _ p ^ q ■ The expected information after the query is 

I(h, . . . , b n \f(k) = 0)/N nor + I(b u b n \f(k) = l)/M nor 
Looking on I(b\, . . . , b n \f(k) — Q)/N nor we can see that 

I(bx, ■ ■ .,b n \f(k) = 0)/N nor = \og(n)/N nor +pqlog(N nor ) - qp\og(p) +pH(a 1 , ...,a k ) + 

(1 - p)(l - q) log(iV nor ) + (1 - q)(l - p) log(l - p) - (1 - P )H(a k+1 , ...,a n ) 
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Using l/N nor + l/M nor = qp + (1 - p)[l - q) + p(l - q) + (1 - p)g = 1 we have 

I(h, . . .,b n \f(k) = 0)/iV nor + I(b u . . .,b n \f{k) = l)/M nor = log(n) - Hip) - H(a u ...,a n ) + 
pq \og(N nor ) + (l-p)(l-q) log(N nor ) +p(l-q) log(M nor ) + (1 - p)q \og{M nor ) 

Which means that the expected information increase after the query is pq log(iV nor ) + (1 — p)(l — q) log(iV„ or ) + p(l — 
q) log(M„ or ) + (1 — p)q\og(M nor ) — H(p) Before we simplify this further (and choose a value for e par to make it close 
enough to I(p)) note that the expected increase does not depend on the actual values of di, . . . , a„, or on the information 
before the query (other than q). 

pq \og(N nor ) + (1 - p)(l - q) log(N nor ) + p(l - q) log(M„ or ) + (1 - p)q \og(M nor ) = 
(pq + (1 - p)(l - q)) log{N nor ) + (p(l - q) + (1 - p)q) \og(M nor ) = 
-{l/N nor )\og{l/N nor ) - (1/M nor )log(l/M nor ) = H(l/N nor ) 

We now need to bound H(l/N nor ). For an ideal partition q = 1/2 we will have H(l/N nor ) = 1, and the expected 
information increase in each query would be I(p), which is optimal. However, q deviates from 1/2 by at most e par , and 
we should now choose e par small enough to get the desired runtime. As q > 1/2 — e par , we have 

H(l/N nor ) > H(p + 1/2 + e par - 2p(l/2 + e par )) = ff (1/2 + e par (l - 2p)) > 1 - 4e 2 par (l - 2pf 

Where the last inequality uses that if 1/2 > x > -1/2 then 1 - 2x 2 > if (1/2 + x) > 1 - 4x 2 
Manipulating this inequality gives x 2 < 1 h(i/2+x) ^ jj sm g jj^ s an( j substituting e par < -y/l/241og(n), 



, 2 -x2 1fi3 , 1 /oN2 . 16( P -l/2) 2 2(p-l/2) 2 l-g(p) , , 

4e par (l-2 P ) =16 Sar (p-l/2) < 241og(n) = 31og(n) < ^0" = ; 

Putting it all together, the expected information increase in every stage is at least 



H(l/2 + e par (l - 2p)) - H{p) > 1 - 4e 2 Qr (l - 2p) 2 - H(p) > I(p) - J(p)/31og(n) = 7(p)(l - ' 



31og(n) 

which ends the proof. □ 

Note that e par is not a function of p. 
Lemma 2.4. 77ie algorithm will reach the recursion condition in stage [7] in an expected number of \og(n) / 1 (p) + 
0(1 / i (p)) function calls. 

Proof. By lemma |2~2l we need H{a\, . . . , a n ) < log(l/e par ). As the initial entropy is log(n) and the expected information 
rise every stage is I(p)(l — 1/3 log(n)) (by lemma |23] l, we have that the expected number of stages is at most 

log(n) - log(l/e par ) log(n) log(n) s 

J(p)(l - 1/3 log(n)) " lip) (1-1/3 log(n)) " lip) + ^ {P> 

Where we used l/(c - x) < l/c+2x/c for c> 2x>0. □ 

Lemma 2.5. Suppose a,i > e par in step\l] Let r — ^Z^ , andl sur — ( jz^;) r -^— ■ Then with probability > 1 — S 

we have 0(j_| Mr ) > s > a (i+/sur) . 

Proof. As the lemma is symmetric we assume without losing generality that s > a^ i _ lstir - j and show that the probability 
for such a distribution ai, . . . , a n is small. As the a/s sum up to 1, there is fc such that i — l sur < k < i and < l/l sur . 
This means that aija-^ > - ep ^^_ p y = (jz^Y- This ratio was created by function calls f(j) for elements k < j < i, 
such that / returned at least x + r times 1, and at most x times 0. Considering the number of ones in 2x + r function calls 
in this regime as a random variable, we get an expectancy of (1 — p)(2x + r) < 0.5(2x + r) and a standard deviation of 
•v/p(l — p)(2x + r). We apply the Chernoff bound after making sure that for every value of x we have x + r is at least 
greater than the expectancy by log(l/<5) standard deviations, or that 

. x + r-(l-p)(2x + r) 

mm ; — > log 1 o 

- y/p(l-p)(2x + r) 
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Function analysis of this gives x — an d trie minimum is J r ^i_p^ ■ This gives r = P ^ p ^ og ^ 1 ^ 5 ^ . Using the 
fact that for 1/2 < p < 1 and a > 



t P sqp(l-p)/(2p-l) < e a/2 
1 -p 

we get Z SJtr < log 2 (e) / '2S 2 e par = 0(1/S 2 e par ). The dependency on 6 can be improved by another variant of the 
algorithm which will be described later. □ 



Lemma l231 gives us the success probability of the algorithm. Its expected runtime is the sum of two terms. By lemma 
12.41 the expected runtime until I(ai, . . . a n ) > log(n) — log(l/e par ) is \og(n)/I(p) + const / 1 (p). By lemma |231 as 

l sur — 0{ - 1 gi ) — Q( "^ 24 ^° S — ) searching between i — l sur and i + l sur adds another term of 0(loglog(n)/J(p)) to 
the runtime. 

Implementation Notes We are interested in the query complexity of the algorithm, rather than its runtime. However, 
we note that a naive implementation of it is poly logarithmic in n (actually 0(log(n) 2 )). This is done by uniting cells of 
the array ai, . . . ,a n when there was no query which discriminates between them. We begin the algorithm with a single 
segment which consists of the entire array. Every query takes a segment, and turns it into two segments (so in the end of 
the algorithm we are left with 0(log(n)) segments). After each query the weight of each segment is updated (0(log(n)) 
time) and choosing where to ask the next query consists of going over the segments (again 0(log(n)) time). This can be 
improved to 0(log(n) loglog(n)) by saving the segments in a binary search tree, every edge on the tree has a probability 
on it, such that multiplying the numbers on a path between the root to a certain vertex gives the weight of all the segments 
which are under the vertex (the leaves of the tree each constitute of a single segment). Suppose we need to query Xj, such 
that we already queried Xk, Xi, k < j < I and no other elements were queried between x% and xi. In this case the leaf which 
represents the segment a^, ■ ■ ■ , a; will have two sons, one representing a,/., ■ ■ ■ ,o,j and the other representing fltj+i, . . . , a;. 
According to the result of the query, one son will have probability p, and the other 1 — p. The data structure will then fix 
the probabilities on the path between the root and the vertex a^, . . . , a; according to the answer of the query. Both finding 
the right element and updating the probabilities takes time which is proportional to the depth of the tree. Each query adds 1 
to the number of leaves, and therefore as there are 0(log(n)) queries this will be the number of leaves. Keeping the search 
tree balanced (such as by using Red and Black trees) gives depth of 0(log log(n)) as required. 

Theorem 2.6. (Lower bound) Let Abe a classical algorithm which finds the right element in a sorted list, using noisy 
comparisons. Assume that A 's success probability is > 1 — t, then A takes at least an expected '"^"^ — log ^Y^~ r ^ 
comparisons. 



Proof. We quantify the maximum amount of information gained every query. Every oracle call gives us at most an expected 
J(p) bits of information. This means that after ^ - 1 ° s(1 / ( ( p 1 ~ T)) oracle queries, the algorithm has log(n) — log(l/(l— r)) 
information bits. Knowing where is the right element is log(n) bits of information. This means that the algorithm has to 
guess at least log(l/(l — r)) bits of information, which is done with success probability 1 — r. □ 

Corollary 2.7. (Lower bound without noise) Let Abe a classical algorithm which finds the right element with success 
probability > 1 — r, then A takes at least an expected log(n) — log(l/(l — r)) comparisons. Moreover, with probability 
1 — It the algorithm uses at least log(n) — 2 log(l/(l — r)) comparisons. 



2.3 Improving the Dependency on 5 

The problem with what we presented so far is the dependency on 6 in l sur . Assume first 5 < log 3 (n). Let l sur — 
(l/7 2 ) 1 /^ 2p_1 ^ for a constant 7. Keeping the same halt condition, the probability to find the right element when it is 
reached will be constant, and that with probability 1 — 8 we will find the right place for s after log(l/<5) trials. Note that this 
means that the algorithm will not end after we are first stuck in stage 1. We therefore update the probabilities of a\, . . . ,a n 
even when we run the algorithm recursively. In this variant the expected number of queries is lc wy + O ( log t 1 /'^ 1 °s log ( n ' > ) . 
The dependency on 5 is what one would expect from this kind of algorithm. The loglog(n) factor in the big-O notation 
comes from the recursive part of the algorithm. Assume now 6 > log 3 (n). Run the algorithm with 5 L = log 3 (n). After the 
algorithm finishes, check l ° B j^J^ times if it returned the right element. If the check succeeded, return this element. If the 

check failed, start all over again, until the check succeeds. The probability that the check fails is 1 /£', and as 5' = log 3 (n), 
the increase in the expected query complexity is negligible. This gives theorem fTTI 
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2.4 Bounding the Variance of the Runtime 

So far we proved that our algorithm finds the right element with probability 1 — 8 with an expected number of lo fffi + 

^( i(p)iog(i/s) ) c l uer i es - Using the strong lower bound in theorem |2~6l we are able to bound the probability that the number 
of queries needed is a lot greater than this number using a generalized Markov inequality, which we do not prove: 
Lemma 2.8. Let X be a positive random variable such that E(X) = a. Assume that Pr(X > b) > 1 — f3, then 

-b+F 
c-b 

log(") i ci log log(n) 



Pr(I>c)<*rc> 1 



Assume that the expected number of queries needed is ' + logfi/s) where c i lS a constant. 
Lemma 2.9. Let x > 1 and 8 > 0. The algorithm presented before will find the required element s in an expected number 
°f X °^(p) + ^( i(p)i°g(i/s) ) auer i es - The probability that the number of queries is greater than '° g ^ + x(ci+2)^iogiog(n) j s 
at most 1 jx- 

Proof. We use the lower bound of theorem [2761 setting 1 — r = 1 — 1/ log(n) (that is r = 1/ log(n)). According to the 
theorem, this means that the number of queries is greater than X °f(^ — 21og ^°s( n )) with probability 1 — 2/ log(n). Using 
lemmaEl with a = ^ + O Ggfigg^ ), 6 = ^ - 21 °f ( °f» , /? = V log(n) and c = ^ + 4 X (c 1+ 2)iogiog(») 
we get that the probability the algorithm requires more than lo f^ + x 4 ( Cl +^°s lp g(") queries is smaller than 1 j\- d 



2.5 Generalized Noisy Binary Search 



In this section we generalize binary search. In the regular search, the algorithm divides a sorted array of items into two 
parts, and the oracle tells it in which part is the desired element. Our generalization is to let the algorithm divide the sorted 
array into k + 1 parts, and the oracle will tell it in which part is the correct element. 

Generalizing the noise model, there is one right part and k wrong ones every query, so we need to state what would be 
the error probability for each kind of mistake. This is done by adding k + 1 probabilities (which sum up to 1), where the 
h'fh probability stands for the chance that the oracle would return j + h (mod k+1) instead of the j'th interval]. 

Formally, let g : {1, . . . , n — l} k — > {0, . . . , k}. If g is being given k indexes, i\ > 12 > . ■ ■ > ik it outputs the answer 
j if Xi - > s > Xi j+1 when we identify io = and ik+i = n. The error probability is taken to account by associating 
k + 1 known numbers po, . . . ,pk to g, such that if Xj > s > Xj+\ then the result j + h mod (k + 1) would appear with 
probability p^- 

The optimal algorithm for this case is very similar to the case k = 1 (which is /). In every step divide the array to k + 1 
parts with (an almost) equal probability, and ask in which part is the element we're looking for. Let 01, . . . , a n , e par and 
l sur as before (albeit with different values this time). 



1. If there is a value i such that a,; > e par halt. If the algorithm halts, then with probability 1 — 8/3, 
x (i-i sur ) — s — x (i+is Ur )> continue recursively. 

2. Else, let i\, . . . if- be indices such that the sum of the elements between two indices does not deviate from 
1/k by more than e par : 

h 

- e P ar < ^2 a h <l/k+ e par 

h—ij^i 

3. Use g(ii, . . . ,ik) and update the probabilities according to Bayes's rule. 



We use e par = ryI/24 log(n). The exact value of l sur depends on Pi, ... /3k, unless we use the variant of the 
algorithm described in 12. 31 

Theorem 2.10. The algorithm presented finds the right element with probability 1 — 8 in an expected query complexity of 

log(n) ( loglog(n) \og{l/8) 



I{po,...,p k ) I(pa,...,Pk) 



'We could have actually used (k + l) 2 numbers, stating the chance to get interval i instead of j for all i,j. This would change the algorithm in an 
obvious manner, and is not necessary for the quantum result. 
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3 Quantum Search With a Non Faulty Oracle 



Farhi et al. presented in [FGGS99] a "greedy" algorithm, which given an array of size K and t queries, attempts to find 
the correct element but has some error probability. Their algorithm actually gives something better. Assume that the 
elements given to their algorithm are yo, yn-i and the special element s. Again we are trying to find i which satisfies 
Hi > s > yi + i (we use different notation than x%, . . . , x n as we are going to combine algorithms with K being a constant 
regardless of n). Their algorithms outputs a quantum register with the superposition T^Sq + i)} (with all indexes 

taken mod K) for fixed (3q, . . . , (3k-i which are not a function of s. Let pj = \(3j | 2 , then measuring this register we obtain 
the correct value with probability po- The exact numbers po , . . -Pk-i are determined by the number of oracle queries t. We 
now use their algorithm (with proper values for K and i) as a subroutine in our generalized search algorithm with k = K. 

Using K = 2 23 and t = 6 gives a distribution Q with I(po, . . . ,Pk) — 18.5625. This gives us an algorithm which 
requires less than 0.32 log(n) oracle questions with o(l) failure probability. This gives theorem [L2l 



4 Quantum Lower Bounds 

To prove lower bounds we use an oracle similar to the one in [HMW03| and [BNRW03]. Let Of be a quantum oracle, 

Oi(\xc)) = \x(0 © c)) if x e L and \x(l ® c)) if x g L. To make O noisy, let 0(\xc)) = cos(a)\x(c © f(x))) + 
sin(a)|a;(c © f(x) © 1)) where f(x) — 1 if and only if x G L, and cos(a) = ^Jp. 
Theorem 4.1. Any noisy quantum algorithm requires Q(log(n) / I{p)) queries. 

Proof. Define A = p — 1/2. We use notation and techniques of | HNS02 1 and assume the reader is familiar with the proof. 
We assume that a run of the algorithm consists of A = (UO) T U\0), where O is an oracle call, U is a unitary and the 
algorithm requires T oracle calls. The quantum algorithm is given an unknown oracle x out of a group S, and after the run 
a measurement is done and the algorithm guesses which oracle was given to it. [HNS02] define the state \ifj x ) to be the 
quantum state after j iterations, when the oracle was x. They define a weight function Wj = 'E XiV& s l ^{x, y) (4>i I ) where 
u}(x, y) is an un normalized distribution on input states. [HNS02] show that if we choose 



uj{x,y) 



h(y)-h( x) ^ < h(x) < h(y) < n 
otherwise 



where h(x) is the hamming weight of x then Wq — nH n — n and Wt — S/Wo, where Sf — 2y/ 5(1 — 5), Hi — Ej i the 
i'th harmonic number, and 5 is the probability for the algorithm to succeed. 

To finish the argument, we need to bound the difference between Wj and Wj+\ and thus gain a bound on T. Define Pi = 
S z >o(z; i\z; i) the projection operator. We deviate a little bit from their article now, and devise a better bound assuming 
that the quantum oracle is noisy. 1HNS02] use the fact that - < 2S ^^ I I^IV^) 1 1 • ||-P#£)||. 

But when the oracle is noisy, we actually have |(V4|^) - < ^x^WPtWM • 11-^1^)11 • (1 - 

VT — 4A 2 ), which is very close to multiplying with 1/I(p). The proof in [HNS02| continues by proving an upper bound of 
un using this sums, plugging this estimation in their proof gives us a factor of (1 — \]\ — 4A 2 ). As the maximal expected 
weight loss is irn/I(p), it would require at least f2(log(n) /I(p)) queries for a quantum algorithm. □ 

Using our techniques enables us to give a better lower bound for the number if queries t a quantum noiseless algorithm 
needs to the find the right element out of k (note we search k instead of n elements) with probability > 1 — 5. |HNS02| 
gave a lower bound of t > (1 — 2y/8(\ — S))^(Hk — 1), applicable only for i5 < 1/2. 

Theorem 4.2. Any quantum algorithm which finds the right element with probability greater than 1 — S requires t > 
^((1 - 5) log(fc)) - 0(5) queries. 

Theorem 4.3. Any quantum algorithm which finds the right element with probability greater than 1 — 5 requires t > 
^((1 - 5) log(fc)) - 0(5) queries. 

Proof. Assume we have such an algorithm. Plug it as subroutine in 12.51 using po = 1 — 5, and pj = 5/(k — 1) for 
j ^ 0. This would give I(po, ■ ■ ■ ,Pk) = log(fc) + (1 — 5) log(l — 5) + 5 log(5/(k — 1)), and an information gain rate of 
I(po , . . . , Pk)/t bits of information per query. However, we know from [ HNS02 1 that any perfect quantum search algorithm 
for an ordered list needs at least — ln(rt) queries. This means that the average information gain per query can be at most 
tt/ ln(2) bits per query. This means that \(\og(k) + (1 - 5) log(l -5) + 5log(5/(k - 1))) < j^j 
And the number of queries t is at least 

t > ^(log(fc) + (1-5) log(l - 5) + 5log(5/(k-l))) > ^((1- 5) log(fc) - 1(5) - 1) « ^ ((1 - 5) log(fe)) - 
0(6) □ 
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This lower bound improves the previously known lower bound, and also has a meaning for relatively high error proba- 
bility S < (k — l)/k, unlike the lower bound of [HNS02 1 which has a meaning only for S < 1/2. 
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A A Review of the Greedy Algorithm 



In this appendix we give a short presentation of the quantum algorithm of [FGGS99], which is being thoroughly used in 
our paper. Farhi et al. look at a problem in the orale model which is congruent to searching an element in a list. They define 
N oracles 



for j = 0, . . . N — 1. The goal of the algorithm is given access to an oracle which calculates fj(x) for unknown j, ind j. 
A query to the oracle consists of calculating fj(x) or some x. They continue by defining 



which is important because Fj +1 (x) — Fj(x — 1) where we identify —1 with 2N — 1. They also define Gj \x) — Fj(x)\x) 
and T\x) = \x + 1). This means that their algorithm can be described as 



Followed by a projective measurement which decides the result. Noticing that T J GjT _J = Go, Farhi et al found a base 
which they denote |0+), . . . , \N — |0— ), . . . \N — 1— ) such that T J |0±) = |j±), and when the measurement results 
in j±, the algorithm outputs that he oracle is J0. 

Demanding that Vi = TVj-iT" 1 , it is possible to calculate the success probability of any given algorithm, by looking 
at the inner product (VkGoVk-i ■ ■ . yiGoVb|0)|0±). For any given state it is possible to calculate which V will 
maximize (VGq4>\0±). Farhi et al define the greedy algorithm recursively starting from Vq, such that each Vi is chosen 
to maximize the overlap of |V;_iGo, . . . ViGqVq) with |0±). Farhi et al. could not find an asymptotical analysis of this 
algorithm, and as it has a probability to err they decided to use another algorithm as a subroutine for their search algorithm. 
We calculated the "greedy" algorithm for various parameters, and looked also at the overlap (V/_iGo, . . . ViGoVo\j±) for 
j ^ 0. Differences in overlaps with different j values enabled us to get the error probability distribution we used before as 
subroutines in our classical search algorithm. 



2 Actually the result should be \j+) if k is even and \j— ) if k is odd. We ignore this point as it is not necessary for the understanding of the algorithm. 





< x < N - 1 
N < x < 2N - 1 



V fc G i V r k_i...V r iG J -V r o|0) 
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